Model Selection

Visual-Language Understanding

# Visual-Language Understanding

VL Rethinker 72B 4bit

VL-Rethinker-72B-4bit is a multimodal model based on Qwen2.5-VL-7B-Instruct, supporting visual question answering tasks, and has been converted to MLX format for efficient operation on Apple devices.

Transformers English

A multimodal large language model fine-tuned from Qwen2.5-VL using the innovative Curr-ReFT method, significantly enhancing visual-language understanding and reasoning capabilities.

Kosmos 2 Patch14 224

Kosmos-2 is a multimodal large language model capable of understanding and generating text descriptions related to images, and establishing associations between text and image regions.

Mengzi Oscar Base Caption

A Chinese multimodal image captioning model fine-tuned on the AIC-ICC Chinese image caption dataset, based on the Mengzi-Oscar pretrained model

Transformers Chinese

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase